Method of encoding an image and device implementing said method

ABSTRACT

The invention relates to a method of encoding an image divided into non-overlapping macroblocks themselves divided into non-overlapping blocks of N by N pixels into a binary stream. It comprises the steps of: —transforming each of the block into a transformed block of coefficients comprising one low frequency coefficient and N2-1 coefficients, called high frequency coefficients, of higher frequencies than the low frequency; —quantizing each coefficient of each of the transformed blocks with a quantizing parameter; —encoding the quantized coefficients into a binary stream. According to the invention, the low frequency coefficients of the transformed blocks are quantized with a same quantizing parameter, called first quantizing parameter.

1. FIELD OF THE INVENTION

The invention relates to a method of encoding an image. More preciselyit concerns a method of quantizing said image. The invention alsorelates to an encoding device implementing said method.

2. BACKGROUND OF THE INVENTION

In order to encode an image, it is often necessary to reduce spatialredundancies. To this aim, in typical image coding methods, the image isdivided into non-overlapping blocks of N×N pixels and each block is thentransformed into a transformed block of coefficients. These codingmethods decorrelate the image pixels so that the redundancy can bereduced more efficiently in the transform domain. In this respect, theenergy compaction property of the transform is important. Among thevarious transforms commonly used, the Discrete Cosine Transform (DCT) iswidely used for its superior energy compaction property. The transformedblock represents a set of coefficients with increasing spatialfrequencies. The coefficient in the top left (0,0) position of thetransformed block is known as the DC coefficient, and it represents theaverage value of the N×N block. The other (N²-1) coefficients are knownas AC coefficients, and they represent the high-frequency details. Thedimension N×N can be 16×16, 8×8 or 4×4 according to differentapplications. In order to reduce the number of bits required to encodethe image, typical image coding methods quantize the coefficients of thetransformed blocks with a quantizing step. Quantization is the processof reducing the number of possible values of a quantity, therebyreducing the number of bits needed to represent it. The choice of thequantizing step is decisive to insure a high quality of the decodedimage.

3. SUMMARY OF THE INVENTION

The invention aims at providing a solution for the quantization of animage.

To this aim, the invention relates to a method of encoding an imagedivided into non-overlapping macroblocks themselves divided intonon-overlapping blocks of N by N pixels into a binary stream. Itcomprises the steps of:

transforming each of the block into a transformed block of coefficientscomprising one low frequency coefficient and N²-1 coefficients, calledhigh frequency coefficients, of higher frequencies than the lowfrequency;

quantizing each coefficient of each of the transformed blocks with aquantizing parameter;

encoding the quantized coefficients into a binary stream. According tothe invention, the low frequency coefficients of the transformed blocksare quantized with a same quantizing parameter, called first quantizingparameter. This advantageously makes it possible to ensure a continuousbasic quality on the whole image

Advantageously, the high frequency coefficients of a same macroblock arequantized with a same quantizing parameter, called second quantizingparameter.

Preferentially, the second quantizing parameter is computed as the sumof the first quantizing parameter and of an increment, the incrementbeing determined based on a visual perceptual interest value computedfor the macroblock.

According to a first aspect of the invention, the visual perceptualinterest value depends on the average luminance value of the macroblock.

According to a variant, the visual perceptual interest value alsodepends on the variance of each block of the macroblock.

According to another variant, the visual perceptual interest value alsodepends on the chrominance information of the macroblock.

The invention further relates to a device for encoding an image dividedinto non-overlapping macroblocks themselves divided into non-overlappingblocks of N by N pixels into a binary stream. This device comprises:

transform means for transforming each of the block into a transformedblock of coefficients comprising one low frequency coefficient and N²-1coefficients, called high frequency coefficients, of higher frequenciesthan the low frequency;

quantizing means for quantizing each coefficient of each of thetransformed blocks with a quantizing parameter;

encoding means for encoding the quantized coefficients into a binarystream.

According to the invention, the quantizing means quantize the lowfrequency coefficients of the transformed blocks with a same quantizingparameter, called first quantizing parameter.

According to an aspect of the invention the transform means are a DCTtransform unit.

4. BRIEF DESCRIPTION OF THE DRAWINGS

Other features and advantages of the invention will appear with thefollowing description of some of its embodiments, this description beingmade in connection with the drawings in which:

FIG. 1 depicts a flowchart of the method according to a first embodimentof the invention;

FIG. 2 depicts a detailed flowchart of the quantization step of themethod according to a first embodiment of the invention;

FIG. 3 depicts a detailed flowchart of the features extraction step ofthe method according to a first embodiment of the invention;

FIG. 4 depicts, for regions within an image, the level of visualperceptual interest of these regions using different grey levels;

FIG. 5 depicts a flowchart of the method according to a secondembodiment of the invention;

FIG. 6 depicts a coding device according to a first embodiment of theinvention; and

FIG. 7 depicts a coding device according to a second embodiment of theinvention.

5. DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS

In FIGS. 1 to 3 and FIG. 5, the represented boxes are purely functionalentities, which do not necessarily correspond to physical separatedentities. Namely, they could be developed in the form of software, or beimplemented in one or several integrated circuits. On these figures, thesimilar elements are referenced with the same numbers.

The invention relates to a method of encoding an image. The image ismade up of pixels and divided into non-overlapping macroblocks M_(l)themselves divided into blocks b_(p,q) of pixels, where l is the indexof the macroblock and (p,q) are the coordinates of the block, i.e. theblock b_(p,q) is located in p^(th) column of blocks and in CO line ofblocks. In many applications, a block b_(p,q) is a block of 8 pixels by8 pixels and a macroblock is made up of 4 blocks b_(p,q). However, theinvention is not limited to this case and may be used whatever the sizeof the blocks b_(p,q) and the size of the macroblocks MB_(l) are.

A first embodiment of the method is described with reference to FIGS. 1,2 and 3. The method comprises a step 10 for transforming each blockb_(p,q) of the image in a transformed block B_(p,q) of coefficients. Ifa block b_(p,q) is located in a macroblock M_(l), then the transformedblock B_(p,q) is located in the corresponding transformed macroblockdenoted MB_(l). Each block b_(p,q) is, for example, transformed in atransformed block B_(p,q) by a Discrete Cosine Transform (DCT) accordingto the following equation:

${{B_{p,q}\left( {u,v} \right)} = {\frac{4{Q(u)}{Q(v)}}{N^{2}}{\sum\limits_{j = 0}^{N - 1}{\sum\limits_{k = 0}^{N - 1}{{b_{p,q}\left( {j,k} \right)} \cdot {\cos \left\lbrack \frac{\left( {{2j} + 1} \right)u\; \pi}{2N} \right\rbrack} \cdot {\cos \left\lbrack \frac{\left( {{2k} + 1} \right)v\; \pi}{2N} \right\rbrack}}}}}},$

where:

-   -   u, v=0, 1, . . . , N−1    -   b_(p,q)(j,k) is the luminance value (or chrominance value) of        the pixel of block b_(p,q) whose coordinates are (j,k);    -   B_(p,q)(u,v) is the value of the coefficient of transformed        block B_(p,q) whose coordinates are (u,v); and

${- {Q(w)}} = \left\{ \begin{matrix}{\frac{1}{\sqrt{2}},} & {{{if}\mspace{14mu} w} = 0} \\{1,} & {{{{if}\mspace{14mu} w} = 1},2,\ldots \mspace{14mu},{N - 1}}\end{matrix} \right.$

At step 20 each coefficient B_(p,q)(u,v) of each transformed blockB_(p,q) is quantized with a quantizing parameter QP_(p,q)(u,v) into aquantized coefficient B^(Q) _(p,q)(u,v). According to a first aspect ofthe invention, all DC coefficients in the image are quantized 210 with asame quantizing parameter QP_(f), i.e. ∀(p,q), QP_(p,q)(0,0)=QP_(f).This makes it possible to ensure a continuous basic quality on the wholeimage. This quantizing parameter QP_(f) is for example determined by arate control method such as the well known TM5 rate control methoddescribed for MPEG2 in the document N0400 from ISO/IEC JTC1/SC29/WG11entitled “Test Model 5” dated April 1993. According to another aspect ofthe invention, all the AC coefficients of all the blocks B_(p,q) locatedin the same macroblock MB_(l) are quantized 220 with a same quantizingparameter QP_(l) which may or may not be equal to QP_(f). According to aspecific characteristic, the quantizing parameter used to quantize allthese AC coefficients equals QP_(l)=QP_(f)+ΔQP_(l), where ΔQP_(l) isdetermined on the basis of the perceptual interest of the macroblockMB_(l). In this case, for all the AC coefficients in MB_(l),QP_(p,q)(u,v)=QP_(l)=Qp_(f)+ΔQP_(l). According to the first embodiment,ΔQP_(l)=−W_(l)·QP_(step) where QP_(step) set for example to 2. Theweight W_(l) is computed 30 by combining 340 three different featuresF_(l) ¹, F_(l) ^(2′) and F_(l) ³ computed from the macroblock M_(l) atsteps 310, 320 and 330 respectively. The first feature associated toeach macroblock M_(l) in the image takes its value in the set {0,1}. IfF_(p,q) ¹=1 then the macroblock M_(l) belongs to a skin region, e.g. ahuman face otherwise, i.e. F_(l) ¹=0, the block does not belong to askin region. To compute 310 this feature it is required to detect theskin tones. To this aim, the skin region in the image, if any may bedetected using a method of color segmentation such as the one proposedby D. Cai and K. N. Ngan in the document entitled “Face segmentationusing skin color map in videophone applications” published in IEEEtransactions on CSVT in 1999. This feature will make it possible toquantized more finely skin regions which are important to human eyesfrom a visual quality point of view. The second feature F_(l) ²associated to each macroblock M, in the image takes its value in the set{0,1}. If F_(l) ²=1 then the macroblock M, is a flat region otherwise itis not. To this aim, the variance v_(k) of each block b_(p,q) located inmacroblock M_(l) is used to compute the value act, as follows:

act_(l)=1+min(v ₁ , v ₂ , . . . , v _(k))

where v_(k) is the variance of the k^(th) block located in themacroblock M_(l). If act_(l) is smaller than a predefined threshold THT,the macroblock M_(l) is a flat region, i.e. F_(l) ²=1, otherwise F_(l)²=0. THT can be set to 8 in most applications. This feature makes itpossible to quantize more finely the flat regions, ie. low detailedregions. Indeed, visual defects in such flat regions are more annoyingfor human eyes. The third feature F_(l) ³ associated to each macroblockM_(l) in the image takes its value in the set {0,1}. To compute 330 thevalue F_(l) ³, the average luminance level over the macroblock M_(l) iscompared to two thresholds THL and THH which can be set to 23 and 1952respectively. A value named level is then derived as follows:

${level} = \left\{ \begin{matrix}{0,} & {{{if}\mspace{14mu} 0} \leq {luma} < {THL}} \\{1,} & {{{if}\mspace{14mu} T\; H\; L} \leq {luma} \leq {THH}} \\{2,} & {{{if}\mspace{14mu} T\; H\; H} < {luma} \leq 255}\end{matrix} \right.$

F_(l) ³=0 if level=0 or 2, and F_(l) ³=1 if level=1. This feature makesit possible to quantize more finely mid-tones regions to which humaneyes is more sensitive from a visual quality point of view.The three features are combined 340 as follows in the weight W_(l):

$W_{l} = \left\{ \begin{matrix}{3,} & {{macroblock}\mspace{14mu} M_{1}\mspace{14mu} {has}\mspace{14mu} 3\mspace{14mu} {features}\mspace{14mu} {that}\mspace{14mu} {equal}\mspace{14mu} 1} \\{2,} & {{macroblock}\mspace{14mu} M_{1}\mspace{14mu} {has}\mspace{14mu} {only}\mspace{14mu} 2\mspace{14mu} {features}\mspace{14mu} {that}\mspace{14mu} {equal}\mspace{14mu} 1} \\{1,} & {{macroblock}\mspace{14mu} M_{1}\mspace{14mu} {has}\mspace{14mu} {only}\mspace{14mu} 1\mspace{14mu} {feature}\mspace{14mu} {that}\mspace{14mu} {equal}\mspace{14mu} 1} \\{{- 2},} & {{macroblock}\mspace{14mu} M_{1}\mspace{14mu} {has}\mspace{14mu} {no}\mspace{14mu} {feature}\mspace{14mu} {that}\mspace{14mu} {equal}\mspace{14mu} 1}\end{matrix} \right.$

This is illustrated by FIG. 4.

The quantized coefficients are then encoded at step 40, for example by aclassical entropy coding method such as the one described at section 7.9in a document from ISO/IEC 14496-10 entitled <<Informationtechnology—Coding of audio-visual objects—Part 10: Advanced VideoCoding>>.

A first embodiment of the method is described with reference to FIGS. 2and 5. The method comprises a step 10 for transforming each blockb_(p,q) of the image in a transformed block B_(p,q) of coefficients. Ifa block b_(p,q) is located in a macroblock M_(l), then the transformedblock B_(p,q) is located in the corresponding transformed macroblockdenoted MB_(l). Each block b_(p,q) is, for example, transformed in atransformed block B_(p,q) by a Discrete Cosine Transform (DCT) accordingto the following equation:

${{B_{p,q}\left( {u,v} \right)} = {\frac{4{Q(u)}{Q(v)}}{N^{2}}{\sum\limits_{j = 0}^{N - 1}{\sum\limits_{k = 0}^{N - 1}{{b_{p,q}\left( {j,k} \right)} \cdot {\cos \left\lbrack \frac{\left( {{2j} + 1} \right)u\; \pi}{2N} \right\rbrack} \cdot {\cos \left\lbrack \frac{\left( {{2k} + 1} \right)v\; \pi}{2N} \right\rbrack}}}}}},$

where:

-   -   u, v=0, 1, . . . , N−1    -   b_(p,q)(j,k) is the luminance value (or chrominance value) of        the pixel of block b_(p,q) whose coordinates are (j,k);    -   B_(p,q)(u,v) is the value of the coefficient of transformed        block B_(p,q) whose coordinates are (u,v); and

${- {Q(w)}} = \left\{ \begin{matrix}{\frac{1}{\sqrt{2}},} & {{{if}\mspace{14mu} w} = 0} \\{1,} & {{{{if}\mspace{14mu} w} = 1},2,\ldots \mspace{14mu},{N - 1}}\end{matrix} \right.$

At step 20 each coefficient B_(p,q)(u,v) of each transformed blockB_(p,q) is quantized with a quantizing parameter QP_(p,q)(u,v) in aquantized coefficient B^(Q) _(p,q)(u,v). According to a first aspect ofthe invention, all DC coefficients in the image are quantized 210 with asame quantizing parameter QP_(f), i.e. ∀(p,q), QP_(p,q)(0,0)=QP_(f).This makes it possible to ensure a continuous basic quality on the wholeimage. This quantizing parameter QP_(f) is for example determined by arate control method such as the well known TM5 rate control methoddescribed for MPEG2 in the document N0400 from ISO/IEC JTC1/SC29/WG11entitled “Test Model 5” dated April 1993. According to another aspect ofthe invention, all the AC coefficients of all the blocks B_(p,q) locatedin the same macroblock MB_(l) are quantized 220 with a same quantizingparameter QP_(l) which may or may not be equal to QP_(f). According to aspecific characteristic, the quantizing parameter used to quantize allthese AC coefficients equals QP_(l)=QP_(f)+ΔQP_(l), where ΔQP_(l) isdetermined on the basis of the perceptual interest of the macroblockMB_(l). In this case, for all the AC coefficients in MB_(l),QP_(p,q)(u,v)=QP_(l)=QP_(f)+ΔQP_(l). According to the second embodiment,

${{\Delta \; {QP}_{1}} = {\frac{V_{1} - 128}{128} \cdot {QP}_{step}^{\prime}}},$

where V_(l) is the importance level V_(l) of the macroblock M_(l). Inthis case QP′_(step) may be set to 6. V_(l) is computed at step 31 fromthe macroblock M_(l) and is clipped so that it lies in the range[0-255]. To this aim the information map described by N. Bruce in thedocument entitled “Image analysis through local information measure”published in the Proceeding of the 17^(th) International Conference onPattern Recognition, pp. 616-619, August 2004 is computed 31 for thewhole image. This map assigns an importance level V_(l) to eachmacroblock M_(l). This method makes it possible to identify theperceptually important regions in the image by analyzing the propertiesof local image statistics in a classic information theoretic setting.Therefore it enables for adjusting the quantization parameter morecontinuously and thereby improves the visual quality of the decodedimage.

The quantized coefficients are then encoded at step 40, for example by aclassical entropy coding method such as the one described at section 7.9in a document from ISO/IEC 14496-10 entitled <<Informationtechnology—Coding of audio-visual objects—Part 10: Advanced VideoCoding>>.

This solution makes it possible to get a smoothly distributed perceptualquality improvement with little change on the overall bitrate.

The invention also relates to a coding device CODEC depicted on FIGS. 6and 7. On these figures, the similar elements are referenced with thesame numbers. The device CODEC comprises a module T for transformingeach block b_(p,q) of the image in a transformed block B_(p,q) ofcoefficients. It implements for example a DCT. The module T is linked toa module Q adapted for quantizing each coefficient B_(p,q)(u,v) of eachtransformed block B_(p,q) in a quantized coefficient B^(Q) _(p,q)(u,v).The module Q implements the step 20 of method described above.Therefore, the device CODEC comprises a module ROI₁ linked to the moduleQ and adapted to compute a weight W_(l) for each macroblock M_(l) of theimage for example by implementing the steps 310 to 340 of the methoddescribed above. According to another characteristic depicted on FIG. 7,the module ROI₁ in the device CODEC is replaced by a module ROI₂ adaptedto compute an importance level V_(l) for each macroblock M_(l) of theimage for example by implementing the method described in the documententitled “Image analysis through local information measure” published inthe Proceeding of the 17^(th) International Conference on PatternRecognition, pp. 616-619, Aug. 2004. The module Q is further linked to amodule COD adapted for encoding the quantized coefficients B^(Q)_(p,q)(u,v). It is preferentially an entropy coder.

The invention described for an image may be advantageously applied to avideo, more precisely to each image of the video.

1. Method of encoding an image divided into non-overlapping macroblocksthemselves divided into non-overlapping blocks of N by N pixels into abinary stream comprising the steps of: transforming each of said blockinto a transformed block of coefficients comprising one low frequencycoefficient and N²-1 coefficients, called high frequency coefficients,of higher frequencies than said low frequency; quantizing eachcoefficient of each of said transformed blocks with a quantizingparameter; encoding said quantized coefficients into a binary stream;wherein the low frequency coefficients of said transformed blocks arequantized with a same quantizing parameter called first quantizingparameter.
 2. Method according to claim 1, wherein the high frequencycoefficients of a same macroblock are quantized with a same quantizingparameter, called second quantizing parameter.
 3. Method according toclaim 2, wherein said second quantizing parameter is computed as the sumof said first quantizing parameter and of an increment, said incrementbeing determined based on a visual perceptual interest value computedfor said macroblock.
 4. Method according to claim 3, wherein said visualperceptual interest value depends on the average luminance value of saidmacroblock.
 5. Method according to claim 3, wherein said visualperceptual interest value depends on the variance of each block of saidmacroblock.
 6. Method according to claim 3, wherein said visualperceptual interest value depends on the chrominance information of saidmacroblock.
 7. Device for encoding an image divided into non-overlappingmacroblocks themselves divided into non-overlapping blocks of N by Npixels into a binary stream comprising: transform means for transformingeach of said block into a transformed block of coefficients comprisingone low frequency coefficient and N²-1 coefficients, called highfrequency coefficients, of higher frequencies than said low frequency;quantizing means for quantizing each coefficient of each of saidtransformed blocks with a quantizing parameter; encoding means forencoding said quantized coefficients into a binary stream; wherein saidquantizing means quantize the low frequency coefficients of saidtransformed blocks with a same quantizing parameter, called firstquantizing parameter.
 8. Device according to claim 7, wherein saidtransform means are a DCT transform unit.